During first ssh connection, wait for routable ip#2070
During first ssh connection, wait for routable ip#2070maximenoel8 wants to merge 28 commits intouyuni-project:masterfrom
Conversation
| type = "pty" | ||
| target_port = "0" | ||
| target_type = "serial" | ||
| source_host = null |
There was a problem hiding this comment.
Why those lanes were remove ?
Answer:
They are redundant defaults that can occasionally cause validation warnings or clutter in modern OpenTofu/Terraform providers
|
Instead of forcing IPv4, I would like first to understand and have an explanation of why using IPv6 fails that particular environment. |
There was a problem hiding this comment.
Pull request overview
Adjusts the libvirt host provisioning logic to prefer an IPv4 address for the initial SSH/provisioner connection, addressing failures when Terraform attempts to connect via an IPv6 link-local (fe80::/10) address.
Changes:
- Update
connection.hostselection to prioritize IPv4 addresses and avoidfe80::link-local IPv6 addresses. - Consolidate multiple
remote-execprovisioners into a single provisioner with sequential commands. - Minor cleanups/formatting adjustments in
backend_modules/libvirt/host/main.tf.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Linux kernel would always prefer IPv6 over IPv4 when there is a choice. So it's expected to use IPv6 whenever possible. It should work. I would like to understand better the situation too. |
Bischoff
left a comment
There was a problem hiding this comment.
Please do not merge this, at least for now.
From what I see after some quick debugging, there is a problem at network level I need to solve.
I updated the PR description |
|
Moving it to draft, the new version is not working yet. |
|
Ok, the changes are working |
please change the title(s), we are not forcing ipv4 anymore |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
|
Instead of using external scripts, maybe we can use the same mechanism of local-exec + qemu-monitor-command (but with different call) that I made on my PR for the libvirt provider example. See: |
1181955 to
f10a0a8
Compare
Thanks for the reference @srbarrios! I looked at your example in dmacvicar/terraform-provider-libvirt#1288. The
The core difference is that a static sleep isn't reliable enough for our CI environment, DHCP timing varies, and we need to know early if a routable IP never appears rather than getting a generic SSH timeout later. Embedding a multi-step polling loop with grep/awk/virsh error handling inside a HCL heredoc would work, but it becomes difficult to read and debug. Keeping the logic in |
…ying on Terraform state.
…y IP retrieval logic.
|
I am not sure if this is strictly related, but I remember that when we updated the terraform-libvirt-provider we had similar issues around IPv6 and link-local addresses being used. Especially for retail. Technically, the provider we build in OBS should be patched to avoid considering an interface ready if that's the only available address. Can you double check that:
|
Problem
During the provisioning phase of test deployments, OpenTofu/Terraform was intermittently failing with the error:
dial tcp [fe80::...]:22: connect: invalid argument
This occurred because the libvirt_domain resource often reports the IPv6 Link-Local address (fe80::/10) via the QEMU agent before the Global IPv6 or DHCP IPv4 addresses are fully assigned. Since Link-Local addresses are not reachable from Jenkins worker, the connection is failing.
Additionally, the previous logic could potentially pick up IPv4 Link-Local (APIPA - 169.254.0.0/16) addresses, which would lead to connection timeouts.
Why the loop
To resolve the "empty host" race condition without relying on arbitrary sleep timers, a dynamic waiter has been introduced.
What does this PRs
Updated the connection block in the terraform_data.provisioning resource to use a strict filter for the host attribute.
Logic: The new logic iterates through all addresses reported by the VM's first network interface and excludes any string starting with fe80 (IPv6 Link-Local) or 169.254 (IPv4 Link-Local).
Result: The provisioner now only attempts to connect to routable Global IPv6 or IPv4 addresses.
Safety: Removed the fallback to 127.0.0.1. If no routable address is found, the host now evaluates to null. This prevents the provisioner from "masking" the failure by attempting to SSH into the local runner/bastion host.
Depends on SUSE/susemanager-ci#1934